74 research outputs found

    A Nutritional Label for Rankings

    Full text link
    Algorithmic decisions often result in scoring and ranking individuals to determine credit worthiness, qualifications for college admissions and employment, and compatibility as dating partners. While automatic and seemingly objective, ranking algorithms can discriminate against individuals and protected groups, and exhibit low diversity. Furthermore, ranked results are often unstable --- small changes in the input data or in the ranking methodology may lead to drastic changes in the output, making the result uninformative and easy to manipulate. Similar concerns apply in cases where items other than individuals are ranked, including colleges, academic departments, or products. In this demonstration we present Ranking Facts, a Web-based application that generates a "nutritional label" for rankings. Ranking Facts is made up of a collection of visual widgets that implement our latest research results on fairness, stability, and transparency for rankings, and that communicate details of the ranking methodology, or of the output, to the end user. We will showcase Ranking Facts on real datasets from different domains, including college rankings, criminal risk assessment, and financial services.Comment: 4 pages, SIGMOD demo, 3 figuress, ACM SIGMOD 201

    Appearance frequency modulated gene set enrichment testing

    Full text link
    Abstract Background Gene set enrichment testing has helped bridge the gap from an individual gene to a systems biology interpretation of microarray data. Although gene sets are defined a priori based on biological knowledge, current methods for gene set enrichment testing treat all genes equal. It is well-known that some genes, such as those responsible for housekeeping functions, appear in many pathways, whereas other genes are more specialized and play a unique role in a single pathway. Drawing inspiration from the field of information retrieval, we have developed and present here an approach to incorporate gene appearance frequency (in KEGG pathways) into two current methods, Gene Set Enrichment Analysis (GSEA) and logistic regression-based LRpath framework, to generate more reproducible and biologically meaningful results. Results Two breast cancer microarray datasets were analyzed to identify gene sets differentially expressed between histological grade 1 and 3 breast cancer. The correlation of Normalized Enrichment Scores (NES) between gene sets, generated by the original GSEA and GSEA with the appearance frequency of genes incorporated (GSEA-AF), was compared. GSEA-AF resulted in higher correlation between experiments and more overlapping top gene sets. Several cancer related gene sets achieved higher NES in GSEA-AF as well. The same datasets were also analyzed by LRpath and LRpath with the appearance frequency of genes incorporated (LRpath-AF). Two well-studied lung cancer datasets were also analyzed in the same manner to demonstrate the validity of the method, and similar results were obtained. Conclusions We introduce an alternative way to integrate KEGG PATHWAY information into gene set enrichment testing. The performance of GSEA and LRpath can be enhanced with the integration of appearance frequency of genes. We conclude that, generally, gene set analysis methods with the integration of information from KEGG PATHWAY performs better both statistically and biologically.http://deepblue.lib.umich.edu/bitstream/2027.42/112430/1/12859_2010_Article_4457.pd

    Network analysis of genes regulated in renal diseases: implications for a molecular-based classification

    Full text link
    Abstract Background Chronic renal diseases are currently classified based on morphological similarities such as whether they produce predominantly inflammatory or non-inflammatory responses. However, such classifications do not reliably predict the course of the disease and its response to therapy. In contrast, recent studies in diseases such as breast cancer suggest that a classification which includes molecular information could lead to more accurate diagnoses and prediction of treatment response. This article describes how we extracted gene expression profiles from biopsies of patients with chronic renal diseases, and used network visualizations and associated quantitative measures to rapidly analyze similarities and differences between the diseases. Results The analysis revealed three main regularities: (1) Many genes associated with a single disease, and fewer genes associated with many diseases. (2) Unexpected combinations of renal diseases that share relatively large numbers of genes. (3) Uniform concordance in the regulation of all genes in the network. Conclusion The overall results suggest the need to define a molecular-based classification of renal diseases, in addition to hypotheses for the unexpected patterns of shared genes and the uniformity in gene concordance. Furthermore, the results demonstrate the utility of network analyses to rapidly understand complex relationships between diseases and regulated genes.http://deepblue.lib.umich.edu/bitstream/2027.42/112463/1/12859_2009_Article_3354.pd

    D-SPACE4Cloud: A Design Tool for Big Data Applications

    Get PDF
    The last years have seen a steep rise in data generation worldwide, with the development and widespread adoption of several software projects targeting the Big Data paradigm. Many companies currently engage in Big Data analytics as part of their core business activities, nonetheless there are no tools and techniques to support the design of the underlying hardware configuration backing such systems. In particular, the focus in this report is set on Cloud deployed clusters, which represent a cost-effective alternative to on premises installations. We propose a novel tool implementing a battery of optimization and prediction techniques integrated so as to efficiently assess several alternative resource configurations, in order to determine the minimum cost cluster deployment satisfying QoS constraints. Further, the experimental campaign conducted on real systems shows the validity and relevance of the proposed method

    The network structure of visited locations according to geotagged social media photos

    Full text link
    Businesses, tourism attractions, public transportation hubs and other points of interest are not isolated but part of a collaborative system. Making such collaborative network surface is not always an easy task. The existence of data-rich environments can assist in the reconstruction of collaborative networks. They shed light into how their members operate and reveal a potential for value creation via collaborative approaches. Social media data are an example of a means to accomplish this task. In this paper, we reconstruct a network of tourist locations using fine-grained data from Flickr, an online community for photo sharing. We have used a publicly available set of Flickr data provided by Yahoo! Labs. To analyse the complex structure of tourism systems, we have reconstructed a network of visited locations in Europe, resulting in around 180,000 vertices and over 32 million edges. An analysis of the resulting network properties reveals its complex structure.Comment: 8 pages, 3 figure

    Modeling performance of Hadoop applications: A journey from queueing networks to stochastic well formed nets

    Get PDF
    Nowadays, many enterprises commit to the extraction of actionable knowledge from huge datasets as part of their core business activities. Applications belong to very different domains such as fraud detection or one-to-one marketing, and encompass business analytics and support to decision making in both private and public sectors. In these scenarios, a central place is held by the MapReduce framework and in particular its open source implementation, Apache Hadoop. In such environments, new challenges arise in the area of jobs performance prediction, with the needs to provide Service Level Agreement guarantees to the enduser and to avoid waste of computational resources. In this paper we provide performance analysis models to estimate MapReduce job execution times in Hadoop clusters governed by the YARN Capacity Scheduler. We propose models of increasing complexity and accuracy, ranging from queueing networks to stochastic well formed nets, able to estimate job performance under a number of scenarios of interest, including also unreliable resources. The accuracy of our models is evaluated by considering the TPC-DS industry benchmark running experiments on Amazon EC2 and the CINECA Italian supercomputing center. The results have shown that the average accuracy we can achieve is in the range 9–14%

    Algorithms and Bounds for Drawing Directed Graphs

    Full text link
    In this paper we present a new approach to visualize directed graphs and their hierarchies that completely departs from the classical four-phase framework of Sugiyama and computes readable hierarchical visualizations that contain the complete reachability information of a graph. Additionally, our approach has the advantage that only the necessary edges are drawn in the drawing, thus reducing the visual complexity of the resulting drawing. Furthermore, most problems involved in our framework require only polynomial time. Our framework offers a suite of solutions depending upon the requirements, and it consists of only two steps: (a) the cycle removal step (if the graph contains cycles) and (b) the channel decomposition and hierarchical drawing step. Our framework does not introduce any dummy vertices and it keeps the vertices of a channel vertically aligned. The time complexity of the main drawing algorithms of our framework is O(kn)O(kn), where kk is the number of channels, typically much smaller than nn (the number of vertices).Comment: Appears in the Proceedings of the 26th International Symposium on Graph Drawing and Network Visualization (GD 2018

    Understanding the adoption of business analytics and intelligence

    Get PDF
    Cruz-Jesus, F., Oliveira, T., & Naranjo, M. (2018). Understanding the adoption of business analytics and intelligence. In Á. Rocha, H. Adeli, L. P. Reis, & S. Costanzo (Eds.), Trends and Advances in Information Systems and Technologies, pp. 1094-1103. (Advances in Intelligent Systems and Computing; Vol. 745). Springer Verlag. DOI: 10.1007/978-3-319-77703-0_106Our work addresses the factors that influence the adoption of business analytics and intelligence (BAI) among firms. Grounded on some of the most prominent adoption models for technological innovations, we developed a conceptual model especially suited for BAI. Based on this we propose an instrument in which relevant hypotheses will be derived and tested by means of statistical analysis. We hope that the findings derived from our analysis may offer important insights for practitioners and researchers regarding the drivers that lead to BAI adoption in firms. Although other studies have already focused on the adoption of technological innovations by firms, research on BAI is scarce, hence the relevancy of our research.authorsversionpublishe
    corecore